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Introduction 


1.1 

In  many  areas  of  science,  business,  industry  and  government,  optimizing 
techniques  are  commonly  applied  to  solve  routine  problems.  Topics  on 
optimization  have  become  an  important  area  of  study  in  disciplines  such 
as  Operations  Research,  Chemical  Engineering,  Electrical  Engineering  and 
Economics.  Mathematical  techniques  related  to  optimization  have  been 
developed  over  the  past  several  hundred  years  and  with  the  applications 
of  modem  computers,  these  techniques  are  making  an  impact  in  many  other 
areas  of  science  and  engineering. 

Statistical  procedures  often  require  optimization  and  in  a  sense 
one  may  regard  statistics  as  a  subarea  of  optimization.  There  are  many 
applications  of  optimizing  methods  in  the  major  branches  of  statistics 
that  the  study  of  optimization  becomes  an  important  area  for  the  statistician. 

The  variety  and  universality  of  the  use  of  the  optimization  techniques 
can  be  gauged  by  a  cursory  perusal  of  the  contents  of  the  two  volumes 
on  Optimizing  Methods  in  Statistics,  Rustagi  (1971,  1979).  The  purpose 
here  is  to  develop  a  logical  introduction  to  important  areas  of  optimization 
as  they  are  applied  to  statistical  problems.  Several  examples  are  given 
from  statistical  areas  where  optimizing  techniques  play  a  major  role  in 
their  solution.  We  consider  examples  from  Estimation,  Nonparametric 
Statistics,  Design  of  Regression  Experiments,  Sample  Surveys,  Multivariate 
Statistics,  Inference,  Information  Theory  and  Regression  Analysis  to 
motivate  the  study  of  optimization. 

The  scope  of  optimizing  techniques  is  fairly  extensive.  However, 
we  shall  put  emphasis  on  those  areas  of  optimization  which  find 
frequent  applications  in  statistical  problems.  The  classical  techniques 
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of  optimization  will  be  discussed  first  and  numerical  methods  of  optimization 
will  be  discussed  next.  Linear  and  nonlinear  programing  methods  will 
be  described  and  variational  techniques  having  connections  with  dynamic 
programming  and  Pontryagin  Principle  will  be  discussed  later.  Applications 
of  other  optimizing  techniques  such  as  those  of  Stochastic  Approximation 
will  also  be  included. 

1.2  Statistical  examples  using  classical  optimizing  techniques. 

Example :  Let  X.  ,X2>. . . ,Xn  be  a  random  sample  from  a  population 

2 

having  a  normal  distribution  with  unknown  mean  y  and  unknown  variance  o  . 

2 

The  estimation  of  u  and  a  by  the  Method  of  Maximum  Likelihood  requires 
2 

maximizing  L(y,cr  )  where 

1  2 
-  ~  Hx.-yr 

2  i  n  2a 

L(y,a  >  =  <^=—>  e  .  (1.2.1) 

/Tnc 

In  'this  problem,  the  solution  can  be  obtained  by  simply  equating  the 
partial  derivatives  of  log  L  to  zero  and  solving  the  resulting  equations 
to  obtain  'the  necessary  conditions  for  an  optimum. 

Suppose  that  there  is  a  constraint  imposed  on  the  parameter  y,  say  that 

y  is  always  positive.  In  this  case,  further  attention  is  to  be  paid  to 

the  process  of  optimization  to  obtain  the  estimate  for  y. 

k 

Example:  Let  p,  ,p~, . . .  ,p.  ,  p.  >_  0,  and  Z  p.  =  1  be  the  probabilities 

k  l  i=i  1 

of  a  trial  ending  in  k  possibilities.  A  sample  of  n  trials,  leads  to 
XpX2> . . . ,x^,  occurrences  of  various  possibilities.  The  maximum  likelihood 
estimates  of  P]_»P2> ■ - ■ are  obtained  by  maximizing  L(p^ , . . . jp^)  such  that 


X1  x2 

>P2  ’  •  •  •  *Pj^  ”  P]_  *  ‘  •?]<;  ’ 


(1.2.2) 


with  constraint,  p.+p0+,.,+p.  z  1.  Usual  method  of  lagrange’s  multiplier 
A-  Z.  K 

rule  is  used  to  obtain  the  solution. 

Suppose  that  there  are  inequality  restrictions  such  as 

Pi  £  P2  £•  •  •£  Pk  Cl.2.3) 

on  the  pis.  In  that  case  the  estimates  have  to  use  the  modern  methods 
of  programming. 

Example:  Consider  a  normal  p-variate  population  with  mean  jj  and 
covariance  matrix  £.  Let  ^ , . . .  be  a  random  sample  from  the  distribution. 
The  logarithm  of  the  likelihood  is  a  constant  multiple  of  L(u,£)  where 

L<ii,£>  =  -  log  |$|  -  tr  (1.2.4) 

with  V  =  * 

Again  differential  cal. cuius  provides  the  maximum  likelihood  estimates  of  u 
and  Z.  Suppose  an  additional  sample  of  size  M  is  given  on  the  first 
k(k<p),  of  the  components.  Then  the  problem  becomes  more  complicated. 

A  recent  discussion  is  due  to  Anderson  and  Olkin  (1979)  for  finding 
Maximum  Likelihood  Estimates  of  ^  and  £.  Similar  problems  arise  when 
some  of  the  components  have  missing  observations. 

Example:  Bounds  of  serial  correlation  coefficient  for  the  time 
series  x^jx^,...  are  needed.  The  serial  correlation  coefficient  of  4th  order  i 
defined  by 

/t  =  2-  . 

4  n-4 


n-S 

t=l  XtXt+* 

v  2 
L  x 

t=l 


(1.2.5) 


H 


Consider  the  upper  bound  of  for  t> 
maximizing  which  is  equivalent  to 

n 

max  I 
t=l 


=  1.  This  is  the  problem  of 


subject  to  the  constraint 

n  2 

Z  x =  constant. 
t=l  " 

Using  Lagrange's  method,  the  solution  of  the  above  problem  can  be  obtained 
Chanda  (1962).  Contrary  to  usual  belief  that  correlation  coefficient  as 
defined  is  between  -1  and  +1,  the  serial  correlation  coefficient  is 
defined  in  (1.2.5)  does  have  higher  bounds  than  1. 

Example:  (Constrained  regression) 

The  multiple  regression  model  generally  assumes  that 

y  =  X$  +  e 

X  XX  X 

where  y  is  nxl  vector,  ^  is  a  nxp  matrix  of  known  constants,  8  is  a 

pxl  vector  of  parameters  and  ^  is  a  nxl  random  vectors  of  errors,  with 

2 

means  Q  and  covariance  c  I.  The  least  squares  estimates  of  Jg  are 
obtained  by 

min  (y-XB) ' (y~xB) .  (1.2. 

g  t  XX  X  XX 

However,  when  B  is  constrained  so  as  to  be  in  a  specified  set  e.g. 

B  >  B  ,  we  have  a  constrained  optimization  problem  and  in  most  cases,  such 
'V  —  U 

an  optimization  problem,  requires  the  use  of  modem  programming  methods. 
Example:  (Optimal  allocation  in  survey  sampling) 

A  large  number  of  problems  in  survey  sampling  require  optimum 
allocation  of  resources  since  the  surveys  are  constrained  by  total  cost, 
Lime  or  the  sampling  units.  Consider  for  example  the  simple  case  in 


6) 
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cluster'  sampling  where  one  is  interested  in  determining  an  optimal  size 
of  a  cluster  which  produces  the  minimum  variance  of  the  sample  mean  for 
a  given  cost. 

Suppose  M  be  the  number  of  total  units  to  be  divided  among  N  clusters 
2  .  2 

of  size  each.  Let  S^,  be  within-cluster  variance  and  Sg  be  the  between- 
cluster  variance.  Let  the  sample  size  selected  be  of  size  n.  Let  the 
(  corresponding  cost  Cg  associated  with  a  cluster  regardless  of  its  size 
and  be  the  cost  associated  with  each  element  regardless  of  cluster 
size.  Then  for  a  fixed  cost  C,  we  have 

C  r  nCB  +  ^oSj 

and  we  want  to  minimize  the  variance  of  the  overall  average,  y. 


2  2 

,„=S  ,,  n,^  +M0S3 

v(y>  -  (1  -  jj)-— - 


(1.2.7) 


The  optimal  solution  turns  out  to  be 


M 


opt 


°W  SB: 


(1,2.8) 


For  further  details  and  other  problems  in  sampling  using  optimization 
methods,  the  reader  my  refer  to  Jessen  (1978)  and  Cochran  (1963). 


1.3  Statistical  examples  using  numerical  techniques. 

Example:  Consider  the  problem  of  estimtion  of  parameter's  of  the 
Gaiuna  distribution 

-x/„  a-1 
e  /6x 


f  (x) 


T(a)3 
=  0 


a 


x  >  0 


Cl. 3,1) 


,  elsewhere . 

The  iiHximum  likelihood  estimates  are  given  by  equating  the  partial 
derivatives  of  log  L  with  respect  to  a  and  6,  where 


-r  Vv-?  ~ .  CqibT’  ' 
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The  equations  are 


log  L  =  — g-  +  (a-l)Z  log  x^ 

-  n  log  P(a)  -  na  log  ft. 

x  -  aft  =  0 


(1.3.2) 

(1.3.3) 


and 

Z  tog  x,.  -  n  'r'('a)  “  n  ioS  °  =  o  -  (1.3.4) 

These  equations  can  be  solved  only  through  numerical  methods.  Tables  of 
the  diaganuH  function  F'(a)/r(a)  are  available,  Pearson  and  Hartley  (1954), 
to  facilitate  the  solution. 

Example :  (Survival  Analys is  ) 

Suppose  the  times  to  death  of  an  individual  follow  an  exponential 
distribution  with  parameter  X.  The  number  of  K  deaths  observed  at 
times  tpt2»...>t  are  known  out  of  n  individuals  under  study.  The  study 
is  to  be  analyzed  at  time  a.  The  estimate  of  the  parameter  X  is  obtained 
by  considering  the  following  likelihood 


L  = 


fi 


Z 

i=l 


t. 

i 


-Xa 

(1-e 


n-t 

) 


(1.3.5) 


K 

or  log  L  =  -tlog  X-X  Z  t.  +  (n-t)log(l-e 
i=l  1 


-Xa 


). 


for  X  is 


The  likelihood  equation 


~  -  It.  +  :  =Xa  "  0  (1.3.6) 

X  i  1-e 

The  above  equation  admits  only  a  numerical  solution  for  X.  The  roots  of 
the  likelihood  equations  lead  to  the  maximum  likelihood  estimates. 

Example :  (Reliability ) 

Realistic  models  in  reliability  theory  and  survival  analysis  require 
numerical  evaluation  frequently.  Consider  the  three  parameter  Weibull 
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distribution  model  for  the  time  to  failure ,  for  a  given  individual .  The 
probability  density  function  is  given  by 


fm(t>  = 


t  >_  y 

0  t  <  y. 


(1.3.7) 


Here  B,  5  >  0  and  y  >_  0. 

Suppose  the  experiment  is  conducted  over  the  period  ( 0,tg)  and  the  times 
of  failures  of  individuals  out  of  t  on  test  are  given  by  t^tj. 

Then  the  likelihood  of  the  sample  is  given  by 

t  _  t!  *°  *2  t.-y  ^ 

6  i=l  ^ 


*0  t.-y  8 

r  (-4—) 


(1.3.6) 


t0-U 

1  -  expC-  (~g- . -)  ] 

The  maximum  likelihood  estimates  of  y,  B  and  6  can  only  be  obtained 
numerically.  Several  such  procedures  are  available  in  the  literature. 

Mann,  Schafer  and  Singpurwalia  (1974)  provide  many  other  models  in  reliabilit 
and  survival  analysis  leading  to  numerical  procedures. 

Example:  (Curve  fitting  problem) 

Suppose  measurements  of  neutron  flux  (y)  are  made  in  a  nuclear 


reactor  at  various  points  (x)  and  the  curve  to  be  fitted  is 
y(x)  =  A  Cos  (Bx+E)  +  C  Cosh  (Dx+E). 

At  points,  x^,  i~l,2,...,n,  adjusted,  y n,  were  made  and  using  the 


(1.3.S) 


assumption  that  variance  of  the  counts  is  proportional  to  its  mean.  It  is  of 


interest  to  find 


to  find  A,  B,  C,  D  and  E  such  that  we  minimize  S  with 


8 


n  Cyi  -  A  Cos  CBx^E)  -  C  Cosh  (2ac.O]' 
^  a.  (A  Cos  (3 x.+E)  -  C  Cosh  (Dx.+E)) 

1  —  -L  11  1 

or*  x.o  s ur[p_Lii  y  the  problem  by  minimizing  S*  with 


S"  =  E  Ca-y.)  1[y— A  Cos  (Bx.+E)  -  C  Cosh  (Dx.+E)]^ 
i=l  111  1  1 


The  nonlinear  form  of  the  function  does  not  allow  us  to  obtain  estimates 


of  A,  B,  C,  D  and  E  in  closed  form.  Hooke  and  Jeeves  (1961)  tiro  vide  a 
numerical  method  by  "Direct  Search"  technique  for  this  optimization 
problem. 

Example:  (Response  surface  designs ) 

Consider  the  following  relationship  between  the  mean  y  of  a  response 


variable  y  and  the  independent  variables  ^  with  unknown  parameters 

k  = 

where  ^  =  (x^, . . .  ,x_) '  is  a  p-dimer.sional  "design"  variable  and 
£  =  (0^,...,9  )  is  a  ^.-dimensional  parameter.  Since  the  function  f  is 
generally  unknown,  it  is  estimated  by  polynomial  of  certain  degree  and 


then  the  estimates  of  £  are  obtained  by  some  method  of  estimation, 
bet  y(^)  =  estimated  response  at 


The  problem  in  response  surface  designs  is  to  find  that  design,  that  is, 
such  that  we  minimize 

J  =  c  /  (y(^)  -  u(^c))^c*£  (1.3 

over  the  class  of  all 

Several  approaches  are  available  in  the  literature.  Initial  impetus 
was  provided  by  Box  and  Wilson  (1951), 


I 

I 

I 

I 

I 

I 

I 

I 

I 


1 

1 

1 


ID 


Let  ru  be  the  number  of  units  in  the  sample  with  values  y^1'1  so  that  under 
simple  random  sampling,  of  size  n,  we  have  n  =  En^,  the  maximum  likelihood 
estimate  of  y  is  obtained  by  maximizing  the  likelihood 


L  = 


T 

n 


i=lln. 


(1.4.4) 


with  n  =  En^,  being  the  total  sample  size.  The  optimization  in  this  case 
reduces  to  an  integer  programming  problem.  Hartley  and  Rao  (1969)  in  their 
paper  on  A  new  estimation  theory  of  Sample  Surveys.  For  other  problems 
in  survey  sampling  using  mathematical  programming,  see  Rao  (1979), 

Example:  (Design  of  experiments) 

An  important  class  of  designs  is  concerned  with  factorial  experiments. 
When  the  number  of  factors  is  large,  all  treatment  combinations  cannot 
be  used  in  a  block  of  ordinary  size  and  hence  fractional  factorial  designs 
have  been  developed.  A  recent  introduction  in  the  study  of  fraction 
factorial  is  the  concept  of  cost  optimality.  The  problems  of  finding 
cost  optimal  fraction  factorials  naturally  lead  to  programming  problems, 


Neuhardt  and  Mount-Campbell  (1978). 

Example:  (Least  absolute  value  estimate  in  two-way  classification). 
Consider  a  two-way  classification  model 

yijk  =  U  +  <*i  +  Sj  +  eijk  (1. 4, 5) 


i-1 , 2 , . . .  ,t ,  j  -1 , 2 , . . .  ,/t ,  k-1 , . .  -  ,n 

with  la.  =  Eg.  =  0.  y...  can  be  regarded  as  kth  observation  at  the  ith 

level  of  first  factor  and  jth  level  of  second  factor. 

We  obtain  the  least  absolute  value  estimates  of  u,  cu,  by  minimizing 


m 

i  j  k 


"  M 


(1,4.6) 


II 


This  problem  is  equivalent  to  the  following  linear  prograimiing  problem. 


Minimize  H  Z  (d. +  d. ) 


i  j  k 


ijk  ijk 


(1.4.7) 


subject  to 


(1.4.8) 


u  +  on  +  8,  +  d-  i  -  d.  .,  =  y. ., 

M  i  p’  ijk  ijk  y±;jk 

d-t.  >  0 

13k.  - 

d . .,  >0. 

13k  - 

A  large  number  of  other  application  of  prograimiing  methods  to  Least  Absolute 
Value  Estimation  is  found  in  Gentle  (1977). 

Example:  (Estimation  of  Markov  chain  probabilities) 

Consider  the  problem  of  estimating  the  transition  probabilities 
Pij(t)  of  the  Markov  chain  x^,  t=l,2...,T,  and  i, j=l,2 , . . .X.  Here 


Pij(t)  =  Pr{Xt  =  ^ix^  = 


(1.4.9) 


where  4^,  i=l,2,....t  are  the  finite  number  of  the  states  of  the  chain. 


Here  E  E  p. . (t)  =  1 


1  3 


and  0  £  p^j  <  t. 


(1.4.10) 

(1.4.11) 


Suppose  the  chain  is  observed  for  N(t)  independent  trials.  Let  w^(t)  be 
the  proportion  of  events  which  fall  in  jth  category.  The  likelihood 
of  the  sample,  then,  can  be  obtained  as  follows 


i  -  n  ,  NCt)!  . 

'  n(N(t)w  (t))I(N(t)  -  £N(t)w,  (t) ) ! 
t-1  m  .  k 

m  k 


(1.4.12) 


N(t)w.(t) 

n(wi(t-i)pi.(t) )  3 

3  J 


N(t)  - 


£  N(t)w(t) 
k  k 


(1  -  S  Z  w. (t-l)p. .(t) ) 
k  i  1  13 


I 

I 

I 

1 

I 


1 
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The  problem  of  maximizing  the  likelihood  in  (1.4,12)  subject  to  (1,4.10)  and  ( 
is  a  nonlinear  programming  problem.  This  problem  with  other  nanife stations 
is  studied  by  Lee,  Judge  and  Zellner  (1968). 

1.5  Variational  methods  in  Statistics. 

Classical  methods  based  on  calculus  cf  variations  have  been  used 
extensively  in  applications,  especially  for  studying  physical  and  mechanical 
system.  Their  use  in  statistics  and  economics  has  resulted  in  new 
developments  of  variational  methodology.  Variational  methods  are  concerned 
with  optimizations  of  functionals  over  a  class  of  functions  such  as 
minimizing  or  maximizing  integrals  of  functions  over  a  class  of  functions 
subject  to  certain  constraints.  In  statistics  there  are  many  applications 
which  depend  very  heavily  on  variational  techniques.  A  recent  book  on  the 
topic  is  by  Rustagi  (1976).  We  provide  a  few  examples  from  statistics 
using  classical  and  modem  variational  techniques. 

Example:  (Order  statistics) 

Suppose  £  x2  xn  3S  311  or^erec^  random  sample  from  a  continuous 

distribution  function  F(x) .  The  expectation  of  the  largest  order  statistic 
Xn  is  gi/en  by 

L(F)  =  /  x  d(Fn(x) ) .  (1.5.1) 

An  important  problem  in  utilizing  order  statistics  is  to  find  upper  and 
lower  bounds  of  L(F)  when  the  mean  and  variance  (say)  of  the  random 
variable  X  are  given. 

Similarly  one  nay  want  to  find  the  bounds  of  the  expectation  of  the 
range,  X^  -  X^ ,  of  the  sample.  That  is, 

min(max)  f  x  d{l-Fn(x)  -  (1-F(x))n}  (1.5.2) 

subject  to  certain  constraints. 
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Such  problems  occur  in  nonparametric  statistical  inference  and  various 
generalizations  have  been  discussed  by  Rustagi  (1957). 


Example;  (Mann-Whitney-Wilcoxon  statistic). 

Suppose  we  are  interested  in  the  bounds  of  the  variance  of  Mann- 
Whitney-Wilcoxon  statistic  for  various  applications  such  as  finding 
confidence  intervals  for  p  =  Pr(X  <  Y).  The  integral  we  minimize  (maximize) 
reduces  to 

2 

1(F)  =  /( F(x)  -  kx)dx  (1,5,3) 

subject  to  the  condition 

/  F(x)dx  =  1  -  p.  (1.5,4) 

This  is  a  variational  problem  and  has  been  treated  in  detail  by  Rustagi  (1961). 
Example:  (Efficiency  of  tests). 

Consider  a  random  sample  from  a  population  having  a  continuous 
distribution  function  F(x).  Suppose  we  are  interested  in  testing  the 
hypothesis 

HQ:  F(x)  =  G(x) 
vs. 

H-^  G(x)  =  F(x-0) 

with  0  as  some  nocation  parameter.  The  relative  asymptotic  efficiency 
of  Wilcoxon  test  with  respect  to  t-test  (which  will  be  used  if  F  and  G  were 
normal  distributions)  is  given  by 

1(f)  =  /f^(x)dx  (1,5,5) 

where  f(x)  is  the  corresponding  probability  density  funcrion  of  X, 

A  problem  of  interest  in  nonparametric  inference  is  to  find  bounds 


of  1(f)  subject  to  side  conditions  sucu  as, 
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/  f(x)dx  =  1 
/  xf(x)dx  =  0. 

For  details,  the  reader  is  referred  to  Hodges  and  Lehmann  (1956). 

Example:  (Regression  designs) 

Consider  a  simple  linear  regression  model, 

y  =  «  +  Bx  +  e 

where  a,  0  are  the  unknown  parameters,  x  is  the  independent  variable 

9 

and  e  is  the  error  with  mean  0  and  variance  cr“..  In  regression  design 
of  experiments,  the  investigator  is  interested  in  allocating  n  observations 
at  x^,.,.,x  so  as  to  optimize  certain  function  of  the  estimated  parameters. 

A  common  criterion  of  optimality  is  the  D-optimality  where  the  determinant 
of  the  covariance  matrix  of  the  estimated  factor  of  parameters  is  optimized. 

a  2 

For  example,  given  a  sample  of  size  n  the  covariance  of  the  (?)  is  Mo  ,  where 

2 


IXi 


M 


nlhx^-x) 

-Zxi 

:(xrx)2 


7 


-LX- 
_ 1 

nZ(x^-iv^ 


ECx. -x)2 
i 


(1.5,6) 


Assuming  that  a  is  known,  the  problem  of  optimal  regression  design  is 

to  find  x  ,x„. . .  ,x  such  that  the  determinant  of  M  is  maximized.  There 
-i  n 

are  many  other  criteria  of  optimality'',  a  detailed  account  is  available  in 
Federov  (1971). 

Example :  ( Robustness ) 

ii-estirrates  of  a  location  parameter,  9,  for  a  probability  density- 

function  f(x-0),  with  cumulative  distribution  function,  F(x-0),  are  defined 

by  Huber  (1972).  A  Statistic  based  on  the  random  sample,  , X,,, , . .  ,X^ 

n 

from  f(x-0)  is  an  M-estimate  if  it  maximizes  £  o(x--T  ),  for  some  metric  p, 

i=l  1  n 

Tn  is  given  by  trie  equation  (1.5.7). 


IS 


i  i 

I  *(x.-T  )  =  0  (1.5.7) 

i=l  a  n 

with  \(i  -  p'.  Note  that  if  p(x)  =  -x' ,  we  get 

f(x)' 

least  squares  estimates  and  if  p(x)  =  -  ^  ^  ,  we  get  maximum  likelihood 
estimates.  It  has  been  sliown  by  Huber  under  fairly  general  conditions 
that  the  asymptotic  variance  of  Tn  is  V(ty,F),  where 


v(*’r>  =  F!,±<) 


(1.5,3) 


with  •+  T  almost  surely  as  n  +  The  problem  is  to  find  an  F0  over  the 
class  of  functions  I'  which  minimizes  V(\p,F).  This  reduces  to  a  variational 
problem.  Uniqueness  and  existence  of  the  solutions  have  been  discussed 
by  Huber  (1972).  tony  other  problems  related  to  robustness  studies 
leading  to  the  applications  of  variational  methods  have  been  recently 
discussed  by  Bickel  (1965),  Portnoy  (1977),  and  Collins  and  Portnoy  (1979), 
Example:  (Admissibility) 


L^t  p^> 
random  vector 


be  the  m-dimensional  multivariate  normal  density  of  a 
^c.  Let  be  an  estimator  of  and  let  the  loss  function  be 


with  £  as  a  known  matrix. 

Suppose  G(^)  is  the  prior  distribution  function  on  R  (£,,£) 

and  the  Bayes  risk  is  denoted  by 

B(G,^>)  =  SR(£,$)G(d%) . 

Then  the  Bayes  estimator  with  prior  G(£) ,  is  given  by 


/BpQ(x)G(d0) 
i  _  ^3  <v  <v 

gT  -  7p ^>G(d£) 


EQ{L(^,6(^))} , 

'V 

(1.5.9) 


(1.5.10) 


6G('^) 


or 


(1.5.11) 


when  g:‘(x)  =  /Pq(x)G(cJ0)  and  AgJ'!(^c)  denotes  the  gradient  vector  of  g*(^). 
% 

The  sufficient  condition  for  an  estimator  5p(^t)  to  be  admissible  is 
the  following: 

"There  exists  non-negative  finite  Borel  measures,  i=l,2,... 

Gp  having  compact  support  with  G^({0})  =  1,  such  that 


B(Gi,6F)  -  B(Gi,6G_)  -v  o 
i 


(1.5.12) 


as  i 


The  above  condition  (1.5.12)  reduces  to 

1/2 


i  g,-  *(x)  ' 


1  V 


f;:(x)dx  . 

'V  % 


(1.5.13) 


Minimizing  I(g*,f*)  answers  the  problem  of  admissibility  of  the  estimator 
^p(^c)  using  techniques  of  calculus  of  variations.  An  elaborate  account 
is  in  Brown  (1971). 

Example:  (Penalized  maximum  livelihood  estimation) 

For  various  reasons,  the  estimation  of  the  probability  density  function 


f(x)  based  on  a  sample  , . . . , X^  is  made  using  a  known  penalty  function 
-♦(f) 


Let  the  likelihood  (penalized)  be 


n 


L(f)  =  II  f(x.)e 
i=l  1 


-♦(f) 


(1.5.14) 


The  problem  of  finding  penalized  maximum  likelihood  estimates  is  to  find 
max  L(f)  subject  to  constraints, 

/f(x)dx  =  1 
and  f(x)  >_  0. 

This  optimization  problem  reduces  to  a  problem  in  variational  methods. 


Detailed  discussion  of  this  and  related  problems  is  given  by  Ee  Montricher, 
Tapia  and  Thompson  (1975). 
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